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Abstract. The global testing problem studied in this paper is to seek a definite answer to whether a 
system of concurrent black-boxes has an observable behavior in a given finite (but could be huge) set Bad. 
We introduce a novel approach to solve the problem that does not require integration testing. Instead, in 
our approach, the global testing problem is reduced to testing individual black-boxes in the system one by 
one in some given order. Using an automata-theoretic approach, test sequences for each individual black- 
box are generated from the system's description as well as the test results of black-boxes prior to this 
black-box in the given order. In contrast to the conventional compositional/modular verification/testing 
approaches, our approach is essentially decompositional. Also, our technique is complete, sound, and can 
be carried out automatically. Our experiment results show that the total number of tests needed to solve 
the global testing problem is substantially small even for an extremely large Bad. 

1 Introduction 

Testing a concurrent and component-based system is notoriously difficult 1 161 141 . One difficulty comes from 
the system's nondeterminism and the synchronizations among concurrently running components. Another 
difficulty lies in the fact that, in a component-based system, its constituent components could be some ex- 
ternally obtained software components (such as COTS products) whose source codes and design details are 
usually not available. In that case, traditional white-box techniques (like static analysis) are not applicable to 
analyzing the system. These components can be readily treated as black-boxes whose models (both at code 
level and design level) are unknown. In this paper, we study a testing problem for such a system of concurrent 
black-boxes. 

In our setup, a system of concurrent black-boxes consists of a host system (called the gluer) and a number 
of black-boxes. Each of the gluer and the black-boxes is called a unit (or a component), which is a (possibly 
nondeterministic and infinite-state) labeled transition system, each of whose labels represents either an ob- 
servable action or an internal action. All the units in the system run concurrently and synchronize on a number 
of observable actions. The gluer is a fully specified finite-state unit. For each black-box, however, except for 
its interface (i.e., the set of its observable actions), everything else is unknown, while its implementation is 
always available and can be black-box tested. A global bad behavior is an observable behavior of the system 
in a given finite set Bad. Finally, the global testing problem studied in this paper is to verify (with a definite 
answer) that, for the given set Bad, the system does not have a global bad behavior. 

A straightforward approach to solve the global testing problem is to perform integration testing over the 
system as a whole and see if the system exhibits a bad behavior. However, there are fundamental difficulties 
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with this approach. For instance, in some applications |30|, integration testing may not be applicable at all. 
Even when integration testing is possible in some situations, the system itself is often nondeterministic. The 
combinatorial blow-up on the number of the executions caused by nondeterministic interleavings among the 
concurrent units in the system generally makes it infeasible to do thorough integration testing, while we are 
looking for a definite answer to the global testing problem. Due to the same reason, even when one has a way 
to handle the nondeterminism 1311 , the size of the given set Bad (which could be very large, e.g., more than 
10 24 in some of our experiments shown later) may also make exhaustive integration testing infeasible. 

A less straightforward approach is to combine testing with some formal method. For instance, one can 
extensively test each black-box alone and try to build 1 26 1 a partial model of the black-box from the test 
results. Then, one can run a formal method like model-checking on the partial system model built from the 
partial models of the black-boxes to solve the global testing problem. However, this approach is also difficult 
to implement. For instance, it is hard to choose effective test sequences to build a partial model of a black-box, 
and it is also hard to know when the tests over a black-box are adequate. Moreover, the partial (and hence 
approximated) system model might not help us obtain a definite answer to the global testing problem. To 
avoid the above difficulties, one may also try, using some formal method, to derive an expectation condition 
over a black-box's behaviors such that: when every black-box behaves as expected, the system guarantees 
to not have a global bad behavior. Then the expectation conditions can be used to generate test sequence for 
the black-boxes. However, the interactions among the concurrent black-boxes make it difficult to derive such 
conditions automatically (see Section[2]for related work on the assume-guarantee style reasoning). 

In this paper, we introduce a novel approach (called the "push-in" technique) to solve the problem, which 
does not entail any integration testing. Instead, in our approach, the global testing problem is reduced to 
testing individual black-boxes in the system one by one in some given order. Using an automata-theoretic 
approach, test sequences for each individual black-box are generated from the system's description as well 
as the test results of black-boxes prior to the black-box in the given order. Suppose that B\,..,,Bk represent 
the concurrent black-boxes in a system. The first step of our approach is to compute an auxiliary set A\ of 
sequences of observable actions for black-boxes B±, . . ., Bk and a set U\ of test sequences for black-box 
B\. Then we test the black-box B\ with test sequences in U\ and collect all successful test sequences into a 
surviving set SUVi. In the second step, from the surviving set SUVi and the auxiliary set A\, we compute 
the auxiliary set A2 (for black-boxes B2, ■ . ., Bk) and the test sequence set U2 for black-box B2 . Again, after 
testing black-box B2 with test sequences in U2, we collect all successful testing sequences into a surviving 
set SUV2- Subsequent steps follow similarly, and eventually, in the last step (i.e., step fc), the global testing 
problem will be decided from the surviving sets. That is, the system has no global bad behavior iff, for some 
1 < i < k, the surviving set SUVi is empty. We also provide a procedure to recover a global bad behavior 
when the answer to the original problem is "no". 

Since the sets (i.e., Ui and At) are provably finite and, in many cases, huge, we use (finite) automata that 
accept the sets as their symbolic representations, and standard automata operations are used to manipulate 
these sets. Also, the global testing problem is decomposed into a series of testing problems over each indi- 
vidual black-box in the system. Hence, our approach is an automata-theoretic and decompositional approach. 
Moreover, the "push-in" technique is both complete and sound, and can be carried out automatically. In par- 
ticular, we show that the technique is "optimal" in the sense that each test we run over a black-box has the 
potential to discover a global bad behavior (i.e., we never run useless tests). In general, exhaustive integration 
testing over a concurrent system is infeasible. However, our experiments show that, using the push-in tech- 
nique, we can completely solve the global testing problem with a substantially smaller number of tests over 
the individual black-boxes, even for an extremely large set of Bad (some of our experiments performed only 
about 10 s unit tests for a Bad of size more than 10 24 ). 



The rest of this paper is organized as follows. In Section^ previous work related to this paper is discussed. 
In Section|5] the formal definitions for a system of concurrent black-boxes and its global testing problem are 
presented. In Section |4] the detail of the push-in technique is shown. In Section [5] a set of experiments are 
run and the results are analyzed. Finally, Section|6]points out some future work. 

2 Related Work 

The global testing problem is essentially a verification problem since we are looking for a definite answer. 
In the area of formal verification, there has been a long history of research on exploiting compositionality 
in system verification, and a common technique is to follow the "assume-guarantee" reasoning paradigm 
12 1 1281 1 9I7I2I9I8I31 . However, a successful application of the paradigm depends on the correct assumptions 
for the components in a system, which are, in general, formulated manually. Several authors suggest solutions 
to the problem of automated assumption generation I17I18I12I15I . But the solutions require that the source 
code and/or the finite-state design is available for a unit, which, unfortunately, is not the case in our setup. 
Although our push-in technique relies on black-box testing instead of an "assume-guarantee" mechanism, it 
can be extended to a system where a black-box is associated with environmental assumptions. 

In the area of software testing, researchers have long recognized the importance of combining formal 
methods (like model-checking) and testing techniques for system verification. Most work (e.g., 161101131 1 
stems from the spirit of specification-based testing, and utilizes model-checkers' capabilities of generating 
counter-examples from a system's specification to produce test-cases against an implementation. This ap- 
proach typically works at the unit level and lacks a "control" over the generated test-cases, since, unlike our 
technique, it does not have an overall and analytical characterization over all the useful (i.e., has the potential 
to recover a global bad behavior) test sequences. In contrast to our ideas, theoretical work in (26{35i| focuses 
on complete testing over a single and finite-state black-box with respect to a temporal property. The decom- 
positional approaches proposed in 1 1 1 22 1 for model-checking feature-oriented software designs rely totally 
on model-checking techniques (no testing) and could cause false negatives. Integration testing of concurrent 
programs in 13 11201 relies on a specification (unavailable in our model) of a concurrent program. 

The quality assurance problem for component-based software has attracted lots of attention in software 
engineering. However, most work considers the problem from component developers' point of view; i.e., 
how to ensure the quality of components before they are released (e.g., 125 13413 3 1291 ). This view, however, 
is fundamentally insufficient: an extensively tested component (by the vendor) may still not perform as ex- 
pected in a specific deployment environment, since the deployment environments of a component could be 
quite different and diverse such that they may not be thoroughly tried by the vendor. Our push-in technique 
approaches this problem from system developers' point of view: how to ensure that multiple components 
function correctly in a host system where the components are deployed. In our technique, test sequences run 
on a component are customized to its specific deployment environment. Unlike our approach, frameworks 
like |4) require a complete specification about the component to be incorporated into a system, which is not 
always possible. 

3 Preliminaries 

In this paper, we consider a system of (concurrent) black-boxes, which consists of a host system (called 
the gluer) and a collection of black-box components (simply called black-boxes). Each of the gluer and the 
black-boxes is a unit. In the rest of the section, we will present the model of a unit, the model of the system 
of black-boxes, and the global testing problem for the system. 



3.1 The Unit Model 



A unit is a nondeterministic and labeled transition system T that moves from one state to another while 
performing an action. Formally, T = (S, s; n i t , V, R), where S is an (infinite and countable) set of states 
with ,Sj n it G S being the initial state, V is a finite set of actions, and R C S X V x S defines the transition 
relation. In particular, the action set V is partitioned into three disjoint subsets: {e} (an internal action), 77 
(input actions), and r (output actions). Especially, the set S = 77 U r, i.e., the set of observable actions in T, 
is called the interface of T. When the set S of states is a finite set, T is called a finite-state transition system. 

A behavior of T is a sequence of actions in V: a\. . .ah (for some h) such that there is a sequence of 
states so. . .Sh with sq = s m i t and (sj, ctj, Sj+i) <G R for each < j < h — 1. An observable behavior of 
T is the result of dropping all the internal actions (i.e., e's) from a behavior. Trivially, the empty string is an 
observable behavior for any unit T. 

A (unit) test sequence a for T is a sequence of observable actions in S. A unit T is considered to be a 
black-box if its interface (i.e., 77 and r) is the only known part in its definition. In this case, we assume that 
T is testable. That is, there is a black-box testing procedure BBtest(T, •) 1 such that, for any test sequence 
a, BBtest(T, a) returns "yes" (i.e., a is successful) if a is an observable behavior of the unit T, and, 
BBtest(T, a) returns "no" (i.e., a is unsuccessful) if otherwise. 

For example, consider the black-box Coram in Figure^ which has seven observable actions (in the figure, 
we use suffixes ? and ! to distinguish input and output actions respectively). Assume that the black-box is 
implemented as shown in Figure [5] Clearly, send msg ack is a successful test sequence to Coram while 
send msg fail is not. 

Obviously, if one further assumes that the black-box is output deterministic (i.e., an input action sequence 
uniquely decides the corresponding output action sequence), then a test sequence for the black-box can be 
simply reduced to a sequence of input actions. However, there are testable units that are not necessarily output 
deterministic (e.g., |24 32 27 1 ). Therefore, to make our algorithms (presented later) more general, we do not 
apply this assumption (under which, obviously, our algorithm still applies). That's why in our definition, a 
test sequence is always a sequence of both input actions and output actions. 

3.2 The System Model 

A system of concurrent black-boxes consists of a gluer G and a number of black-boxes B\, . . ., Bk, written 
Sys = G(Bi, . . ., Bk). The gluer and the black-boxes are all units which run concurrently and synchro- 
nize on certain actions. More precisely, G is a fully specified and (nondeterministic) finite-state unit G = 
(Sq, sfnit, Vo, Rq), whose interface is S = 77 U r . Each B t is a black-box unit B = (Si, s- nit , V,, 
which is testable and whose interface (the only given part of the black-box) is Si = 77,; U 7^. As mentioned 
earlier, a black-box is not necessarily a finite-state unit. The state sets Sq, . . ., Sk are all disjoint. But the 
interfaces Sq, . . ., Sk may not be disjoint: some units may share some common actions. 

We use S = Sq U . . . U Sk to denote all the observable actions in the system Sys (this implies that 
each unit's observable actions are also observable in the system), and use Sig(a), called the signature of a, 
to denote the set of all < i < k such that a G Si. Therefore, the signature indicates the units that share 
action a. 

The system Sys, which also works as a labeled transition system, is a Cartesian product of its units. That 
is, Sys = (S, Si n it, V, R), where S = Sq X . . . X Sk is the system's (global) state set S; each unit starts from 
its own initial state; i.e., the initial global state Sj n it of the system is (sf nit , . . -s^); and V = {e} U S with 
S = So U . . . U Sk is the system's action set. 

1 The black-box testing procedure can be implemented in practice for a variety of transition systems @. 



The system's (global) transition relation RC 5 x V x S is more complex. A global transition that moves 
the system from a global state (so, . . ., s&) to another global state (s , . . ., s k ) while performing an action 
a £ V is in R iff one of the following conditions is satisfied: 

- a is an internal action (i.e., e), and exactly one unit in the system performs the internal action while the 
remaining units do not move; i.e., 30 < i < k. (s,, e, s£) £ R4 A V0 < j 7^ i < k. Sj = s'j, 

- a is an observable action (i.e., a £ £), and all the units whose interfaces contain the observable action 
a synchronize over the action while the remaining units do not move; i.e., V0 < i < k. (i £ Sig(a) A 
(si, a, s'i) £ Ri) V (i £ Sig(a) A s< = s<). 

In other words, at any moment in the system Sys, exactly one unit performs an internal action, exactly one 
unit performs an observable action that is not shared with any other unit, or multiple units synchronize over 
a common observable action. It shall be noticed from the above definition that the synchronizations allowed 
in our model are quite flexible. Not only can the units in a system synchronize over an output/input pair as 
most other system models allow, they can also synchronize over just an output action or an input action, if 
only they can perform this (no matter output or input) action at a certain global state. Also, in our model, 
a synchronization can either occur between a pair of units or among more than two units; thus multi-cast 
or broadcast is allowed. Certainly in some systems, multi-cast, broadcast, or synchronizations over only an 
output action or input action may be undesirable. In that case, they can be easily eliminated just by renaming 
the actions. It shall also be pointed out that, in the system Sys, if a global transition is a synchronization 
over a pair of output and input actions among some units, these two actions are considered to be one single 
action, and we do not discriminate whether it is output or input but just treat it as an observable action to the 
environment. 

As defined earlier, a sequence a £ E* is an observable behavior of the system Sys of black-boxes if the 
system, treated as a transition system, has an execution from the initial global state to some global state and, 
on the execution, a is the observable behavior. 
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Fig. 1. A Data Acquisition System 



For example, consider a data acquisition system shown in Figure^ which consists of one Gluer and 
three black-box components: Timer, Sensor and Comm. The system works as follows. Once started, the 
Timer keeps signaling afire event when the time interval set runs out; the Timer can also be paused (resp. 
resumed) by an incoming pause (resp. resume) event. The Sensor is supposed to respond to afire event 
by signaling a data event when the sensor's reading is ready; it also signals a serr event when something is 
wrong inside the Sensor. The Comm component responds to a send event to send some data by signaling a 
msg event to some underlying network; it responds to an ack (resp. nack) event by signaling an ok (resp. fail) 



Fig. 2. The Gluer 




Fig. 3. Internal implementation of Timer 



event to indicate that the data associated with a previous send event has been transmitted successfully (resp. 
unsuccessfully) by the underlying network; it signals an cerr event when something is wrong inside Comm. 
The Gluer (whose transition graph is depicted in Figure^ simply relays data from Sensor to Comm; it 
pauses the Timer when something is wrong with the Sensor or Comm, and after that, it resumes the Timer 
when either an ok or fail is received from Comm. Together, they constitute a data acquidition system, which 
periodically transmits a reading of the Sensor through Comm via some underlying communication network. 
In this system, the Gluer and the three components run concurrently and synchronize with each other by 
sending and receiving those events (here, all synchronizations are over output/input pairs between two units). 
The internal implementations of the three components are shown in Figure|5| Figure|4] and Figure[5] respec- 
tively 2 . It can be seen (though not obviously) that the following sequence is an observable behavior of the 
system: fire fire serr pause data send msg ack ok resume fire, while sequence^zre^zre serr data pause send is 
not. 

When all the black-boxes are fully specified, our system model is roughly equivalent to the IOTS studied 
in 1271 . Our model is also closely related to I/O automata 1231 (but ours is not input-enabled) and to interface- 
automata ]9) (but ours, similar to the IOTS, makes synchronizations between units observable at the system 
level). These observable synchronizations are the key to testing the behavior of a system of concurrent black- 
boxes, where an abstract model (such as design or source code) of each black-box is unavailable. 

Let Bad C E* be a given set of test sequences that are not supposed to be the observable behaviors 
of the system Sys. The global testing problem is to verify (with a definite answer) that none of the test 
sequences in Bad is an observable behavior of the system. Clearly, in general, the problem can not be solved 
completely since the set Bad can be infinite and, for testing, only finitely many test sequences can be run. 
Therefore, we assume that Bad is a finite set, which can be given as an explicit list of test sequences (e.g., 



2 Obviously, the push-in technique does not require these transition graphs, which are provided only for readers to 
understand the system 



Fig. 4. Internal implementation of Sensor 




Bad = {fire fire, fire fire data, fire data send fire}) or as a symbolic representation (e.g., Bad is all 
sequences in regular expression fire data (fire)* send whose lengths are between 10 and 30). 

4 The Push-in Technique 

In this section, we present the "push-in" technique to completely solve the global testing problem, by per- 
forming unit testing over each individual black-box in the system. A test sequence is a string or a word. A 
finite set of test sequences is therefore a regular language and, in this paper, we use a (finite) automaton that 
accepts the finite set as the symbolic representation of the set. Our push-in technique is automata-theoretic. 
For each 1 < i < k, the technique generates two automata: Ui and Aj. Automaton f/j, called a unit test 
sequence automaton, accepts words in alphabet Si\ i.e., it represents a set of test sequences for black-box £?,;. 
Automaton Aj, called an auxiliary automaton, accepts words in alphabet J7j U . . . U (observable actions 
for the black-boxes Bi, . . ., Bk). Our push-in technique works in the following k steps, where i is from 1 to 
k: 

Step i. The step consists of two tasks: 

(Automaton Generation) This task generates the unit test sequence automaton U% and the auxiliary automaton 
A%. We first generate the auxiliary automaton A%. Initially when i = 1, the generation is based on the Sys's 
description (i.e., the gluer G and the interfaces for B±, . . ., Bk) and the given set Bad. When i > 1, the 
generation is based on the auxiliary automaton Aj_i and the surviving set SUVi-i (see below) obtained 
from the previous Step i — 1. If the empty string is accepted by the auxiliary automaton A,-, then the global 
testing problem (none of observable behaviors of the system Sys is in Bad) returns "no" (i.e., a bad behavior 
of the system exists) - no further steps need to run. We then generate the unit test sequence automaton Ui 
directly from the auxiliary automaton A; constructed earlier. This task is purely automata-theoretic and does 
not involve any testing. 

(Surviving Set Generation) In this second task, using BBtest, we perform unit testing over the black-box Bi 
for all test sequences accepted by the test sequence automaton Ui (Ui always accepts a finite set). We use 
SUVi, called the surviving set, to denote all the successful test sequences. If the surviving set is empty, then 
the global testing problem returns "yes" (i.e., none of observable behaviors of the system Sys is in Bad). 
Otherwise, if i < k (i.e., it is not the last step), we goto the following Step i + 1. If i = k (i.e., it is the last 
step and the surviving set is not empty), then the global testing problem returns "no" (i.e., some observable 
behaviors of the system Sys is indeed in Bad). 



In the rest of this section, we will clarify how Automata Generation and Surviving Set Generation in the 
k steps can be done. Since our technique heavily depends on automata theory, we would like to first build the 
theory foundation of our technique before we proceed further. 

4.1 Theory Foundation of the Push-in Technique 

Let us first make a pessimistic (the name is borrowed from the discussions in (9)) modification of the original 
system Sys by assuming that each black-box £?j, 1 < i < k, can demonstrate any observable behavior in S* 
(recalling that Si is the interface of the black-box). The resulting system is denoted by Sys. Clearly, every 
observable behavior of Sys is also an observable behavior of Sys (but the reverse is not necessarily true). 

Notice that Sys does not have any black-boxes since the original black-box Bi, after the pessimistic 
modification, can be considered as a finite state unit Bi with only one state, where each action in Si U {e} 
is a label on a transition from the state back to the state. According to the semantics definition presented in 
Section EOl it is not hard to see that Sys itself, after the composition of the gluer G with all the one-state 
units Bi, . . ., Bk, is a finite state transition system with \G\ (the number of states in the gluer) states and with 
actions in S U {e}. (Recall that S = Sq U . . . U Sk is the union of all observable actions in the gluer and 
the black-boxes.) The pessimistic system can also be treated as a pessimistic (finite) automaton by making 
each state be an accepting state and each e-transition be an e-move. In this way, the language (a subset of S*) 
accepted by the automaton is exactly all the observable behaviors of the pessimistic system. 

As we have mentioned earlier, the set Bad C S* is a finite and hence regular set. Suppose that the 
symbolic representation of the set is given as an automaton Msad (whose state number is written |Mb qc z|); 
i.e., the language accepted by M Bad is exactly the set Bad. 

Using a standard Cartesian product construction, one can build an automaton Mgiobai, called the global 
test sequence automaton, to accept the intersection of the language accepted by the pessimistic automaton Sys 
and the language accepted by the automaton Ms a d- That is, M g i Q b a i accepts exactly the bad and observable 
behaviors of the pessimistic system. Clearly, the state number in M g i ooa i is at most \G\ ■ \MBad\- 

For a word a € S*, we use a J,^, 1 < i < k, to denote the result of dropping all symbols not in Si 
from a. That is, if a is an observable behavior of the system Sys, then a [s i is the corresponding observable 
behavior of black-box Bi . The theory foundation of our push-in technique can be summarized in the following 
theorem, which can be shown using the semantics defined in Section lX2l 

Theorem 1. For any global test sequence a in S*, the following two items are equivalent: 

(1) a is a bad ( i.e., in Bad) observable behavior of the system Sys of black-boxes B\ , . . . , Bk, 

(2) a is accepted by the global test sequence automaton M g i b a i, and each of the following k conditions 
holds: 

(2.1) a [s 1 is an observable behavior of B\, 

(2.k) a ls k is an observable behavior of Bk- 

We use "class C" to denote all the as that satisfy Theorem^^). Obviously, the global testing problem 
(i.e., there is no bad behavior in Sys) is equivalent to the emptiness of class C. 

In the push-in technique, the jobs of Step 1, . . ., Step k are to establish the emptiness of class C using 
both automata theory and black-box testing. One naive approach for the emptiness is to use Theorem^^) 
directly: repeatedly pick a global test sequence a accepted by M g i t, a i (note that Mgiobai accepts a finite 
language) and, using black-box testing, make sure that one of the conditions (2.i), 1 < i < k, is false. This 



naive approach works but inefficiently. This is because, when M g i t a i accepts a huge set (such as more than 
10 24 in our experiments shown later), trying every such element is not only infeasible but also unnecessary. 
Our approach of doing the job aims at eliminating the inefficiency. First, we do not pick a global test sequence 
a. Instead, we compute the test sequences run on black-box Bi from the testing results on black-box in 
the previous Step i— 1. As we have mentioned at the beginning of this section, each Step i has two tasks to 
perform: Automata Generation and Surviving Set Generation, which are presented in detail as follows. 

4.2 Automata Generation in Step i 

This task in Step i is to generate two automata: the unit test sequence automaton Ui and the auxiliary au- 
tomaton Ai . 

Initially when i = 1, A\ is constructed as A\ = M g i b a i lsiU...us h : i- e -> the result of dropping every 
transition in M g i b a i that is labeled with an observable action not in Si U . . . U Sk- U± is constructed as the 
automaton Ui = A\ ls 1 (i.e., the result of dropping every transition in Ai that is labeled with an observable 
action not in Si). Observe that Ai accepts the language Ai = {a is!u...us k ■ ot accepted by M g i b a i} and 
Ui accepts the language Ui = {a a is in Ai}. The state number in either of the two automata, in worst 
cases, is \M g i oba i\. 

When i > 1, the two automata Aj and Ui are constructed from the auxiliary automaton A_i and the 
surviving set SUVi-i obtained in the previous step. To construct Ai, we first build an automaton sv,Vi-i to 
accept the finite set SUVi-i. Then, we build an intermediate automaton Mj_i that works as follows: on an 
input word in U . . .Sk)*, Mj_x starts simulating A,_i and suv-i-i on the word, in parallel. During 

the simulation, whenever suvi-i reads an input symbol that is not in (note that suvi-i only accepts 
words in S*_ x ), it skips the input symbol. Mi-i accepts the input word when both A-i and suvi-i accept. 
Finally, the auxiliary automaton Ai is constructed as Ai = Mi [si\j...s k - The unit test sequence automaton 
Ui is constructed as Ui = Ai 

One can show that each of the two automata Ai and Ui has, in worst cases, a state number of |Aj_i| • 
|st«>t-i|. Also, Ai accepts the language ^ = {a is t u...us k - a G (Si-iU. . .US k )* is in Ai-i and a 
is in SUVi-i} and Ui accepts the language Ui = {a }s t : a G (Si U . . .Sk)* is in Ai}. 

As we have mentioned earlier, when the empty string is accepted by the auxiliary automaton A (a stan- 
dard membership algorithm can be used to validate the acceptance), our push-in technique will return a "no" 
answer on the global testing problem (i.e., the system does have a bad observable behavior) and no further 
steps need to run. 

4.3 Surviving Set Generation in Step i 

The surviving set SUVi is the set of all successful unit test sequences a <G Ui, i.e., SUVi = {a G S* : a € 
U and a is an observable behavior of black-box Bi}. 

A straightforward way to obtain the set is to run the black-box testing procedure BBtest over the black- 
box Bi with every test sequence in Ui. This is, however, not efficient, in particular when the set Ui is huge. 
Observable behaviors of a unit are prefix-closed: if a is not an observable behavior of Bi, then, for any (3, aj3 
can not be (i.e., test sequence a/3 need not be run). With prefix-closeness and BBtest, we use the following 
automata-theoretic procedure to generate the surviving set SUVi. 

Recall that Ui is a finite set of unit test sequences and, as a regular language, accepted by the unit test 
sequence automaton Ui. Let m be the maximal length of all test sequences in Ui (the length can be obtained 
using a standard longest path algorithm over the transition graph of automaton Ui). Our procedure consists 



of the following to jobs. Each Jobj, where j is from 1 to to, is to identify (using black-box testing) all the 
successful test sequences (with length j) which are prefixes (which are not necessarily proper) of some test 
sequences inUi. In order to do this efficiently, the job makes use of the previous testing results in ©j-i- More 
precisely, each Jobj has two parts (by assumption, let 6>o contain only the empty word.): 

- Define Pj to be the set of all the prefixes with length j of all the unit test sequences in Ui. Calculate the 
set Pj C Pj such that each element in Pj has a prefix (with length j — 1) in Qj—i. To implement this 
part, one can first construct an automaton (from automaton Ui) to accept the language Pj . Then, construct 
another automaton to accept the set Finally, an automaton M can be constructed from these two 
automata to accept the language Pj . All the constructions are not difficult and do not involve testing. 

- Using BBtest, generate the set Oj that consists of all the successful test sequences over black-box Bi 
in Pj. Hence, one only runs test sequences in Pj instead of the entire Pj, thanks to the previous testing 
results in Oj-i. 

It is left to the reader to verify that, after the jobs are completed, the surviving set SUVi can be obtained as 
Ui n (Uo<j<mG*j)- Again, this set can be accepted by an automaton, treated as a symbolic representation of 
the set, constructed from automaton Ui and the automata built in the above jobs to accept Oj, 1 < j < to. 
One can choose the procedure to output the explicit set SUVi or its symbolic representation suVi. 

4.4 Correctness and Bad Behavior Generation 

Since the global testing problem is equivalent to the emptiness of class C, we only need to show that the 
emptiness is answered correctly with the push-in technique. Clearly, the technique always terminates with a 
yes/no answer. It returns "yes" only at some Step i, 1 < i < k, whose surviving set SUVi = 0- It returns 
"no" only 

CASE1. at some Step i, 1 < i < k, when the auxiliary automaton A4 accepts the empty word, or 

CASE2. at the last Step k when SUV k ^ 0. 
In these two cases, in order to demonstrate a global bad behavior of the system, we first define an operation 
called selectj(-), 1 < j < k. Given a sequence aj, the operation returns a sequence ay-i (when j = 
1, it simply returns aj) satisfying the following conditions: aj-i 6 A7-1, ay-i is SUVj-i and 
ay-i isM...s h = aj- The returned sequence ay_i may not be unique. In this case, any sequence (such as a 
shortest one) satisfying the conditions will be fine. Now, we define another operation called BadGen^-), 
1 < j ' < k, as follows. Given a sequence aj, we first calculate <Xj_i = select, (<x,). Then, we calculate 
OLj-2 = selectj_i(aj_i), and so on. Finally, we obtain a\. At this time, the operation BadGerr, (aj ) 
returns any sequence a satisfying the following conditions: a is accepted by M g i b a i and a ls 1 u...s k = &i- 
All these operations can be easily implemented through automata constructions. 

Coming back to bad behavior generation, in CASE1, we return BadGerii(A) (where A is the empty 
sequence) as a global bad behavior. In CASE2, we simply pick any sequence a k from SUVk and return 
BadGen^ (0;^) as a global bad behavior. 

One can show that our technique is indeed correct: 

Theorem 2. If the class C is empty then the push-in technique returns "yes ", otherwise it returns "no ". 
When the technique returns yes, it shows that the system doesn 't have any of the global bad behaviors in 
BAD, otherwise it indicates that the system does exhibit bad behaviors in BAD. 

In each step of our algorithm, one can use standard algorithms in automata theory to make the obtained 
automata like LVs and A;'s smaller. The algorithms include eliminating unreachable states and/or minimiza- 
tion. Additionally, the algorithms as well as all the automata constructions mentioned in the push-in technique 
can be implemented using existing automata manipulation tools like Grail QJ. 



From the correctness theorem, we know that the push-in technique is sound and complete. However, 
one question still remains unsolved: Are test sequences (for black-box Bi) in each Ui more than necessary 
(in solving the global testing problem)? We can show that each Ui derived from our push-in technique is 
"optimal" in the following sense. Suppose that we have completed the first i — 1 Steps (i.e., the black- 
boxes Bi, . . ., Bi-i have been tested) and have obtained^ to start the subsequent steps (i.e., the remaining 
black-boxes Bi,,..,B^ are not tested yet). Each test sequence ctj in Ui has to be run, since one can show the 
following two statements: There are black-boxes B* , . . ., B^, such that is a successful (resp. unsuccessful) 
test sequence for B* and the system G(Bi, . . ., Bi_i,B*, . . ., B^) has (resp. does not have) a global bad 
behavior. 
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Table 1. Experiment Results: Counts of Test Sequences 



5 Experiments 

All the experiments were performed on a PC with a 800MHz Pentium III CPU and 128MB memory. The Grail 
ffl tool was used to perform almost all the automata operations 3 . The entire experiment process was driven 
by a Perl script and carried out automatically. Our experiments were run on the system of black-boxes shown 
in Figure^ In the experiments, we designated black-boxes Timer, Sensor and Comm as B\, B2, and B3, 
respectively. The internal implementations of the black-boxes are shown in Figures[3]|Uand[5] on which the 
unit testing in the experiments was performed. We have totally run twelve experiments (each experiment is 
a complete execution of the push-in technique), which are divided into four cases. Each of the four cases 
consists of three experiments, which are illustrated in detail as follows. 

Case 1 Firstly, we wish that whenever a pause event takes place, there should be no more send un- 
til a resume occurs. The corresponding bad behaviors are specified as a regular expression, S*p(U — 
{r})*sE*, where £ is the set of all the twelve events in the system; p, r, and s stand for the pause, 
send, and resume, respectively (such abbreviation will be used throughout this section). For the first ex- 
periment run in this case, we chose the Bad to be all words in the regular expression that are not longer 
than 10 (denoted by "maxlength=10"). The remaining two experiments were run with "maxlength=20" and 



3 We implemented (in C) three additional operations to manipulate automata with e-moves and to count the number of 
words in a finite language accepted by an automaton, which are not provided in Grail. 



"maxlength=30", respectively. To understand the results shown in Table 14.41 we go through the third exper- 
iment (i.e., "maxlength=30"). The results of the experiment are shown in the box at the right upper corner 
in the table (i.e., under the four columns associated with "maxlength=30" and in the three rows ("stepj", 
"step 2 ", "step 3 ") associated with "case 1"). The three steps in the experiment correspond to the three 
Steps (since there are three black-boxes) in the push-in technique. The auxiliary automaton A\ calculated 
in Step 1 accepts totally #j4 x = 2.16 x 10 24 test sequences. The unit test sequence automaton U\ accepts 
#U\ = 4.14 x 10 7 test sequences. Using the black-box testing procedure in Section l4~3l we actually only 
performed TCi = 2.87 x 10 5 unit tests over B l (the Timer), among which #SUVi = 2.23 x 10 5 tests 
survived. In Step 2 and Step 3, we obtained #^2, #t/2, #^3, #Us similarly as shown in the table. In par- 
ticular, we actually performed TCi = 2940 unit tests over the Sensor in Step 2 and TC3 = 1577 unit tests 
over the Comm in Step 3. Since the last surviving set SUV3 is not empty {^SUV^ = 274), the experiment 
detects a global bad behavior specified in this case. 

Notice that the total number of unit tests run in this experiment is TC\ + TCi + TC$, which is not more 
than 2.92 x 10 5 . This number essentially indicates the actual "cost" of the experiment in deciding whether 
there is a global bad behavior specified in the case and whose length is bounded by 30. This number is quite 
good considering the astronomical number #Ai — 2.16 x 10 24 which would be the number of integration 
test sequences if one run integration testing, since M g i t, a i = A\ in the system. The other two experiments 
("maxlength=10" and "maxlength=20") also detected a global bad behavior and results are shown in the first 
three rows under "maxlength=10" and "maxlength=20" in Table |4~4| (the costs of these two experiments, 
which are 148 and 5262 respectively, become much smaller). 

Case 2 The detected bad behaviors are due to the concurrency nature of these black-boxes: a fire was 
issued before the pause is sent to Timer, which eventually leads to another send. For instance, a global 
bad behavior could be like the following: fire data send msg fire data send cerr fire data pause send. 
From this observation, we believed that the system might also have other bad behaviors: after a cerr takes 
place, there could be another cerr coming before a resume occurs. Such bad behaviors are encoded by 
S*c(S — {r})*cS*. The three experiments in this case, however, did not detect such bad behaviors (i.e., 
4j=SUVz = for all lengths, shown in the third row "steps" associated with "case 2" in Table l4.4l . 

Case 3 Based upon the experiments in the previous case, we carefully studied the system and realized that 
the implementation of Comm might be wrong: after an error occurs (i.e., a cerr outputs), Comm is supposed 
to retain its state prior to the output of the cerr, while it does not. After correcting this bug (by making the 
internal implementation of Comm, shown in Figure|5] move to state s2 instead of sO after a cerr is output), in 
this case, we run the three experiments again. The experiments detected bad behaviors only with length more 
than 10 (i.e., #SUVs = when maxlength is 10 and jfSUVs > when maxlength is 20 and 30, shown in 
Tablets}. 

Case 4 Now we want to test that: after an error occurs in Sensor (i.e., a serr is issued), there will be 
at most one more fire issued before a resume occurs. The corresponding bad behaviors are encoded by 
S*serr{E — {r})* f(S — {r})* f(S — {r})*rS*, where / stands for fire. Our experiments did not detect 
any of such behaviors for all the three choices of maxlength: 10, 20, 30. In fact, in the experiments, no testing 
over Comm was needed. This is because, shown in the last three rows of Table E3| #SUV 2 is for all the 
three choices. 

We measured the total time that our script used for automata manipulations in each of the twelve exper- 
iments, shown in Table [2] In the table, the "result" shows whether a global bad behavior was detected in an 
experiment; i.e., "x" (resp. "^/") indicates "detected" (resp. "not detected"). As shown in the table, the total 
time is within a minute for all the four experiments with "maxlength=10". For "maxlength=20", the time is 
still acceptable (within an hour). When the maxlength is increased to 30, the time is still within our patience 



(which was set to be 24 hours). Yet, our script could not finish within the patience for any experiment when 
we tried to push maxlength to 40. Even though determinization and minimization are optional in our push-in 
technique, we made them mandatory in our experiments. In this way, we can cross-compare the sizes of the 
automata obtained in each step of the experiments. The largest size of all the automata constructed in the 
twelve experiments, after determinization and minimization, is with 726 states and 2138 transitions. In an 
experiment with maxlength=40, the script tried to make an automaton (with 1182 states) deterministic and 
failed to do so within our patience. 

Exhaustive integration testing over a concurrent system is in general infeasible. However, the experi- 
ments show that, using the push-in technique, we can completely solve the global testing problem with a 
substantially smaller number of tests over each individual black-box only, even for an extremely large set 
of Bad. For instance, the total number of unit tests (TCVs) performed in each of the four experiments with 
"maxlength=30" is in the order of 10 5 , while each Bad is in the order of 10 24 (notice that each Bad is always 
larger than each H=A\, shown in Table l4.4l . 
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6 Future Work 

This paper presents an automata-theoretic and decompositional technique to testing a system of concurrent 
black-boxes, which is automatic, sound, and complete. Our technique can be generalized to many other forms 
of bad behavior specifications (i.e., the finite set Bad). For instance, we may that specify that Bad consist of 
all observable sequences not longer than 40, each of which can make the gluer enter a given (undesired) state. 
But the exact formalisms for bad behavior specifications need further investigation. Our model of the system 
is based on synchronized communications. Therefore, it would be interesting to see whether the approach can 
be generalized to some forms of asynchronous (e.g., shared-variable) systems. Black-boxes in our model are 
event-driven; it is also worthwhile to study other decompositional testing approaches for data-driven black- 
boxes. Sometimes, our push-in technique fails to complete, due to an extremely large bad behavior set Bad 
(e.g., our experiments with "maxlength=40" shown earlier, whose global test sequences deduced from Bad 
are roughly as many as 10 33 ). In this case, we need study methods to (symbolically) partition the set into 
smaller subsets such that the push-in technique can be run over each smaller subset. In this way, a global bad 
behavior could instead be found. In our definition of the push-in technique, there is not a pre-defined ordering 
in testing the black-boxes. For instance, in our experiments, the ordering was Timer, Sensor, Coram, based 
on the size of a black-box's interface. Clearly, more studies are needed to clarify the relationship between the 
efficiency of our technique and the choices of the ordering. 
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